Search CORE

8 research outputs found

A Class of MSR Codes for Clustered Distributed Storage

Author: Choi Beongjun
Moon Jaekyun
Sohn Jy-yong
Publication venue
Publication date: 06/01/2018
Field of study

Clustered distributed storage models real data centers where intra- and cross-cluster repair bandwidths are different. In this paper, exact-repair minimum-storage-regenerating (MSR) codes achieving capacity of clustered distributed storage are designed. Focus is given on two cases:

\epsilon=0

and

\epsilon=1/(n-k)

, where

\epsilon

is the ratio of the available cross- and intra-cluster repair bandwidths,

n

is the total number of distributed nodes and

k

is the number of contact nodes in data retrieval. The former represents the scenario where cross-cluster communication is not allowed, while the latter corresponds to the case of minimum cross-cluster bandwidth that is possible under the minimum storage overhead constraint. For the

\epsilon=0

case, two types of locally repairable codes are proven to achieve the MSR point. As for

\epsilon=1/(n-k)

, an explicit MSR coding scheme is suggested for the two-cluster situation under the specific condition of

n = 2k

.Comment: 9 pages, a part of this paper is submitted to IEEE ISIT201

arXiv.org e-Print Archive

Crossref

An Analytical Model-based Capacity Planning Approach for Building CSD-based Storage Systems

Author: Byun Hongsu
Choi Beongjun
Han Jungwook
Jamil Safdar
Kim Changsoo
Kim Youngjae
Lee Myungcheol
Park Sungyong
Publication venue
Publication date: 07/06/2023
Field of study

The data movement in large-scale computing facilities (from compute nodes to data nodes) is categorized as one of the major contributors to high cost and energy utilization. To tackle it, in-storage processing (ISP) within storage devices, such as Solid-State Drives (SSDs), has been explored actively. The introduction of computational storage drives (CSDs) enabled ISP within the same form factor as regular SSDs and made it easy to replace SSDs within traditional compute nodes. With CSDs, host systems can offload various operations such as search, filter, and count. However, commercialized CSDs have different hardware resources and performance characteristics. Thus, it requires careful consideration of hardware, performance, and workload characteristics for building a CSD-based storage system within a compute node. Therefore, storage architects are hesitant to build a storage system based on CSDs as there are no tools to determine the benefits of CSD-based compute nodes to meet the performance requirements compared to traditional nodes based on SSDs. In this work, we proposed an analytical model-based storage capacity planner called CSDPlan for system architects to build performance-effective CSD-based compute nodes. Our model takes into account the performance characteristics of the host system, targeted workloads, and hardware and performance characteristics of CSDs to be deployed and provides optimal configuration based on the number of CSDs for a compute node. Furthermore, CSDPlan estimates and reduces the total cost of ownership (TCO) for building a CSD-based compute node. To evaluate the efficacy of CSDPlan, we selected two commercially available CSDs and 4 representative big data analysis workloads

arXiv.org e-Print Archive

Secure Clustered Distributed Storage Against Eavesdropping

Author: Choi Beongjun
Moon Jaekyun
Sohn Jy-Yong
Yoon Sung Whan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/11/2019
Field of study

This paper investigates interplay among storage overhead, bandwidth requirement, and security constraint in distributed storage. In the model used in our analysis, storage nodes are dispersed in multiple clusters. When a node fails, necessary content gets restored by downloading data from different nodes that may possibly be in other clusters. The bandwidth required for transferring data for node repair is assumed more scarce for cluster-to-cluster links than the links connecting intra-cluster nodes. Eavesdropping takes place on links across clusters only, and a fraction of the total number of clusters is assumed compromised. When a cluster is compromised, any repair traffic going in and out of it is eavesdropped. For this clustered model with eavesdroppers, we analyze the security of distributed storage systems (DSSs) and provide guidelines on designing system solutions for securing the data. First, under the setting of functional repair, we derive a general upper bound on the secrecy capacity, the maximum data size that can be stored in DSSs with perfect secrecy. In the practically important bandwidth-limited regime where the node storage size is equal to the repair bandwidth, the upper bound is shown to be achievable through proposed code constructions. Moreover, we obtain a closed-form expression for the required system resources-node storage size and repair bandwidth-to store a given amount of data with perfect secrecy. Second, we investigate the behavior of secrecy capacity as the number of compromised clusters increases. According to our mathematical analysis, the secrecy capacity decreases as a quadratic function until the number of compromised clusters reaches a certain threshold. Finally, based on the fundamental relationship between the system resources and the secrecy capacity, we provide a guideline on balancing intra- and cross-cluster repair bandwidths depending on the given system security level

ScholarWorks@UNIST

Capacity of Clustered Distributed Storage

Author: Choi Beongjun
Moon Jaekyun
Sohn Jy-Yong
Yoon Sung Whan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/02/2017
Field of study

A new system model reflecting the clustered structure of distributed storage is suggested to investigate interplay between storage overhead and repair bandwidth as storage node failures occur. Large data centers with multiple racks/ disks or local networks of storage devices (e. g., sensor network) are good applications of the suggested clustered model. In realistic scenarios involving clustered storage structures, repairing storage nodes using intact nodes residing in other clusters are more bandwidth consuming than restoring nodes based on information from intra-cluster nodes. Therefore, it is important to differentiate between intra-cluster repair bandwidth and cross-cluster repair bandwidth in modeling distributed storage. Capacity of the suggested model is obtained as a function of fundamental resources of distributed storage systems, namely, node storage capacity, intra-cluster repair bandwidth, and cross-cluster repair bandwidth. The capacity is shown to be asymptotically equivalent to a monotonic decreasing function of number of clusters, as the number of storage nodes increases without bound. Based on the capacity expression, feasible sets of required resources which enable reliable storage are obtained in a closed-form solution. Specifically, it is shown that the cross-cluster traffic can be minimized to zero (i.e., intra-cluster local repair becomes possible) by allowing extra resources on storage capacity and intra-cluster repair bandwidth, according to the law specified in the closed form. The network coding schemes with zero cross-cluster traffic are defined as intra-cluster repairable codes, which are shown to be a class of the previously developed locally repairable codes

arXiv.org e-Print Archive

Crossref

ScholarWorks@UNIST

Secure clustered distributed storage against eavesdroppers

Author: Choi Beongjun
Moon Jaekyun
Sohn Jy-yong
Yoon Sung Whan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/02/2017
Field of study

This paper considers the security issue of practical distributed storage systems (DSSs) which consist of multiple clusters of storage nodes. Noticing that actual storage nodes constituting a DSS are distributed in multiple clusters, two novel eavesdropper models - the node-restricted model and the cluster-restricted model - are suggested which reflect the clustered nature of DSSs. In the node-restricted model, an eavesdropper cannot access the individual nodes, but can eavesdrop incoming/outgoing data for Lc compromised clusters. In the cluster-restricted model, an eavesdropper can access a total of l individual nodes but the number of accessible clusters is limited to Lc. We provide an upper bound on the securely storable data for each model, while a specific network coding scheme which achieves the upper bound is obtained for the node-restricted model, given some mild condition on the node storage size

arXiv.org e-Print Archive

Crossref

ScholarWorks@UNIST